Two-stage Character Classification: a Combined Approach of Clustering and Support Vector Classifiers



A COMBINED APPROACH OF CLUSTERING AND SUPPORT VECTOR CLASSIFIERS LOUIS VUURPIJL AND LAMBERT SCHOMAKER vuurpijl ni,s homaker ni, This paper des ribes a two-stage lassi ation method for (1) lassi ation of isolated hara ters and (2) veri ation of the lassi ation result. Chara ter prototypes are generated using hierar hi al lustering. For those prototypes known to sometimes produ e wrong lassi ation results, a \support ve tor lassi er" (sv ) is trained. The sv an be used to in rease the on den e that a lassi ation is orre t and furthermore de ide on a lassi ation if the on den e using the standard method is too low. Experiments with the iUF UNIPEN database yield 94% re ognition rate. In ases where both lassi ers agree, the error rate is zero. 1 Introdu tion In handwriting re ognition, a standard approa h of hara ter lassi ation is to (1) nd a number of hara ter prototypes (allographs), based on a set of distin tive features and (2) mat h unknown hara ters to the labeled allographs for lassi ation. A wide range of te hniques an be used to generate a set of prototypes like lustering methods, nearest-neighbor methods or neural networks. Su h methods have the advantage over, e.g., hidden-Markov models or multi-layered per eptrons, in that the prototypes are visible: they are made expli it and the lassi ation performan e of ea h individual prototype an be investigated in detail. Clustering is a well-known te hnique for nding a set of prototypes. In1, a novel lustering te hnique was des ribed, whi h obtains a set of prototypes, organized in a hierar hi al stru ture. Ea h prototype ontains members of similar hara ter shapes, and its entroid is de ned as the average of its members. Using prototypes for distan e-based nearestentroid mat hing, lassi ation performan es of 86% were a hieved, where the lassi ation errors were partly due to labeling errors and mainly due to the onfusion that ertain ombinations of lasses introdu e. The aspe t of \zooming in" on onfused hara ter lasses is further elaborated in2. In that paper, a system is des ribed whi h engages soalled intelligent agents in ase of potential onfusion. Here, we des ribe a method of ombining the luster-based prototype mat hing approa h with a lass-separation te hnique alled Support Ve tor Classi ation3;4;5. The method is used to (1) in rease the on den e that a lassi ation is orre t and (2) de ide on a lassi ation if the on den e using the nearestentroid mat hing is too low. 423

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Support Vector Machine Based Facies Classification Using Seismic Attributes in an Oil Field of Iran

Seismic facies analysis (SFA) aims to classify similar seismic traces based on amplitude, phase, frequency, and other seismic attributes. SFA has proven useful in interpreting seismic data, allowing significant information on subsurface geological structures to be extracted. While facies analysis has been widely investigated through unsupervised-classification-based studies, there are few cases...

متن کامل

دسته‌بندی پرسش‌ها با استفاده از ترکیب دسته‌بندها

Question answering systems are produced and developed to provide exact answers to the question posted in natural language. One of the most important parts of question answering systems is question classification. The purpose of question classification is predicting the kind of answer needed for the question in natural language. The  literature works can be categorized as rule-based and learning...

متن کامل

Malware Detection using Classification of Variable-Length Sequences

In this paper, a novel method based on the graph is proposed to classify the sequence of variable length as feature extraction. The proposed method overcomes the problems of the traditional graph with variable length of data, without fixing length of sequences, by determining the most frequent instructions and insertion the rest of instructions on the set of “other”, save speed and memory. Acco...

متن کامل

Modeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification

Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Method...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل


Recently, tuning the weights of the rules in Fuzzy Rule-Base Classification Systems is researched in order to improve the accuracy of classification. In this paper, a margin-based optimization model, inspired by Support Vector Machine classifiers, is proposed to compute these fuzzy rule weights. This approach not only  considers both accuracy and generalization criteria in a single objective fu...

متن کامل

ذخیره در منابع من

  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000